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Abstract 

This paper presents a novel method for the reconstruction of a neural network connectivity 
using calcium fluorescence data. We introduce a fast unsupervised method to integrate dif¬ 
ferent networks that reconstructs structural connectivity from neuron activity. Our method 
improves the state-of-the-art reconstruction method General Transfer Entropy (GTE). We 
are able to better eliminate indirect links, improving therefore the quality of the network 
via a normalization and ensemble process of GTE and three new informative features. The 
approach is based on a simple combination of networks, which is remarkably fast. The 
performance of our approach is benchmarked on simulated time series provided at the 
connectomics challenge and also submitted at the public competition. 

Keywords: network reconstruction algorithms, elimination of indirect links, connectomes 

1. Introduction 

Understanding the general functioning of the brain and its learning capabilities as well as 
the brain structure and some of its alterations caused by disease, is a key step towards a 
treatment of epilepsy, Alzheimer’s disease and other neuropathologies. 

This could be achieved by recovering neural networks from activities. A neural network 
is a circuit formed by a group of connected (physically or by means of neural signals) neurons 
that performs a given functionality. These circuits are responsible for reflexes, senses, as 
well as more complex processes such as learning and memory. 

Thanks to fluorescence imaging, we can easily measure the activity of a group of neu¬ 
rons. The changes of fluorescence recorded from the neural tissue are proved to be directly 
corresponding to neural activity. With calcium imaging one can study the neural activity of 
a population of neurons simultaneously allowing to uncover the function of neural networks. 

But, recovering the exact wiring of the brain (connectome) including nearly 100 billion 
neurons that have on average 7000 synaptic connections to other neurons is still a daunting 
task. Hence, there is a growing need for fast and accurate methods able to reconstruct these 
networks. That is why the ChaLearn non-profit organization has proposed the connectomics 
challenge. The goal of the competition is to reconstruct the structure of a neural network 
from temporal patterns of activities of neurons. 
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2. Typical Methods 

There is a wide variety of reconstruction algorithms that are able to infer the network 
structure from time series. Even though one of the least controversial necessary criterion 
to establish a cause-effect relationship is temporal precedence, many causal inference algo¬ 
rithms only require conditional independence testing Pearl (2000), or, more recently, joint 
distribution of pairs of variables Janzing et al. (2010). The work of Clive Granger has lead 
to a framework that has received a lot of attention due to its simplicity and the successful 
results Popescu and Guyon (2013). 


2.1. Correlation with discretization 

Here, we present a quick review of the simplest method to reconstruct a network, based on 
the correlation coefficient. The correlation is a standard method to quantify the statistical 
similarity between two random variables X and Y and it is defined as: 


cov{X,Y) E[{X - fix){Y - f,Y)] 

corr[X, ¥) = - = - 

axcry cyxcry 


( 1 ) 


where fix and fiy are respectively the expected values of X and Y, and ax and ay are 
the standard deviations. 

Once we have a correlation coefficient between each pair of neurons, we can construct 
a co-activity network. If the correlation is greater than a threshold, then the neurons are 
connected in an undirected way, this strategy is presented at Butte and Kohane (2000), 
where instead of using the correlation they use Mutual Information which can be seen as a 
non-linear dependency measure. 


2.2. Generalized Transfer Entropy 

Here, we present a quick review of one of the state of the art methods, the Transfer Entropy 
(TE) Schreiber (2000) based measure of effective connectivity called Generalized Transfer 
Entropy, or GTE Stetter et al. (2012). 

The basic idea behind Granger causality to test if the observations of time series of two 
variables A and B are due to the underlying process “A causes B” rather than “B causes 
A”, is to fit different predictive models A (present time) and B (present time) as a function 
of A (past times) and B (past times). If A can be better predicted from past values of A 
than from past values of B, while B is also better predicted by A, then we have an indication 
for A being the cause of B. 

Based on this idea, several methods have been derived in order to improve the results. 
These methods incorporate the frequency domain analysis instead of a time domain analysis 
Nolte and Miiller (2010) . One recent idea is to add contemporaneous values of B to predict A 
and vice versa to take into account instantaneous causal effect, due for instance to insufficient 
time resolution Moneta and Spirtes (2005). 

Therefore, GTE can be seen as a reconstruction algorithm of causal networks based 
solely on pairwise interactions. 
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3. Our proposal: Unsupervised ensemble of CLRed pairwise features 


If A activates B and this last one activates C, is very likely that co-activity networks will 
find a strong dependency between A and C even though the latter is an indirect link. Gene 
Network Inference methods have proposed different strategies to eliminate these indirect 
links Butte and Kohane (2000); Margolin et al. (2006); Meyer et al. (2007). 

One of these strategies is the Context Likelihood or Relatedness network (CLR) method 
Faith et al. (2007). In order to do so, the method derives a score that is associated to the 
empirical distribution of the score values. Consider a score S'* j indicating the strength of an 
alleged connection between two neurons i and j. Let us call and cjj the mean and standard 
deviation of this score over all neurons connected to i. The asymmetric standardized score 
is given as: 


Zi = max 



( 2 ) 


Finally, the symmetrized score is given by: Zij = y z? + z'j. This method has a com¬ 
plexity of 0{n‘^), n being the number of neurons, and requires a symmetric matrix. 

Our unsupervised ensemble of pairwise features uses the CLR algorithm to eliminate 
indirect links and normalize the network before assembling the different CLRed pairwise 
features. With the second step we are able to eliminate more indirect links that are still 
present at one reconstructed network but not at the others. This idea comes from mod- 
ENCODE Consortium et al. (2010); Marbach et al. (2012), where the authors propose 
an algorithm to integrate different network inference methods to construct a community 
networks which is capable of stabilizing the results and recover a good network. Their 
state-of-the-art method to combine networks is based on rank averaging. The individual 
ranks of each link are added together to compute the final rank. Then, the final list is 
computed sorting these score decreasingly. This method is also known as Ranksum, and 
will be referred as RS in the paper. 

Instead of this procedure, our proposal that will be referred as CLRsum or CS is 
formulated as follows: 

N 

CS := CLR {featurej) (3) 


A description of the workflow of our network reconstruction process is available in Fig¬ 
ure 1 in the Supplementary Material (see section 6). In this case, we have used four features 
that are defined in the following subsections. 


3.1. Feature 1: Symmetrized GTE 

The first pairwise measure is a modification of the state-of-the-art method GTE (see section 
2.2), since we apply the CLR method the recovered network should be undirected. Indeed, 
the GTE method provides a non-symmetric score {gtei^j ^ gtej^i), we symmetrize it by 
taking the most conservative score recovered by GTE. This symmetrized GTE network is 
denoted as CTEsym, and is defined as follows: gteij = min{gtei^j, gtcj^i). 
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3.2. Feature 2: Correlation of the extrema of the signals 

The second pairwise measure is based on the correlation of both signals {corr{Xi, Xj)). But, 
doing so we are not able to discriminate between true regulations and indirect effects or 
light scattering effects. We observed experimentally with the provided networks and their 
respective ground-truth, that the correlation between the signals when one of both neurons 
is spiking is statistically more informative than the plain correlation. 

The quantile qk%{x) is the data value of x where we have A: % of the values of x above 
it. The higher the quantile the stronger the statistical correlation between the measure and 
the connectome network. However, in order to be able to recover a non-spurious correlation 
at least several hundreds of samples are required. First, we capture the quantile a% of both 
signals, and compute the correlation using only the points of both signals that are above 
the quantile: 


Let tk ■■= Xi{t) > g«%(W) and ti := Xj{t) > qa%{Xj) 
ctij = corr {Xi{tn),Xj{tn)) with tn ■■= tk U ti 

Computing previous equation between each pair of different neurons we obtain the CT^% 
network. 

3.3. Feature 3: mean squared error of difference signal 

The third pairwise feature that we have found experimentally, is complementary to feature 
2. Instead of computing the correlation on the spikes, this feature uses the mean squared 
error of the points where the two signals disagree the most (once both have been normalized 
by a centering and scaling). The normalization process is defined as X? := 

First, we compute the difference between the two scaled different signals (f ^ j) and 
keep the points where they differ the most. To get such particular time points, we also rely 
on an small quantile a%: 

Let fij{t) = X^{t) - X^{t) and tk := dij{t) > qa%ifi,j) 

Then X' := X/(4) X' := X]{tk) 

Once the points of interest (tk) are extracted, the mean square error between pij := X[ — 
X'- and pj^i := X'- — X[ is computed. This leads to a non-symmetrical measure. As explained 
before, CLR requires a symmetric matrix. In order to symmetrize the matrix we take the 
minimum oipfj andpA as has been done in feature 1: cdij = min {{X[ — X'-)^, {X'- — . 

This measure is computed between each pair of different neurons to obtain the MD^y^ 
network. 

3.4. Feature 4: range of difference signal 

The last pairwise measure that we have found correlated to the connectome is the range of 
the difference between two neuron signals. 

For every pair of neurons we compute the difference between the two different signals 
{i ^ j): dfi,j := A, - Xj 

Then, the measure captures the range of dfij, but in order to be robust to the presence 
of noise the range is not the difference between the largest and smallest values of d/ij. 
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but the average over the 10th maximal/minimal values of dfij. This measure is computed 
between each pair of different neurons to obtain the network RD. In order to obtain a 
similarity network, we invert the network as follows; 

RD := max{R) — R with diag{RD) = 0 (6) 


4. Experiments 

The performance of our algorithm is benchmarked on the data provided at the ChaLearn 
connectomics challenge. The data reproduces the dynamic behavior of real networks of 
cultured neurons. The simulator also includes the typical real defects of the calcium fluo¬ 
rescence technology: limited time resolution and light scattering artifacts (the activity of 
given neuron influences the measurements of nearby neurons) Stetter et al. (2012). 

The challenge provides different datasets that have distinct properties, we will use the 
datasets where the network structure is also provided, i.e, 10 big datasets of 1000 neurons 
and 5 small datasets of 100 neurons. The network inference problem can be seen as a binary 
decision problem: after the thresholding of the network provided by the algorithm, the final 
decision can be seen as a classification: for each possible pair of neurons, the algorithm 
either define a connection or not. Therefore, the performance evaluation can be assessed 
with the usual metrics of machine learning like Receiver Operating Characteristic (ROC) 
and Precision Recall (PR) curves. The ChaLearn Connectomics proposes as a global metric 
the use of the area under the ROC curve (AUC), however, ROC curves can present an 
overly optimistic view of an algorithm’s performance if there is a large skew in the class 
distribution, as typically encountered in sparse network problems. To tackle this problem, 
precision-recall (PR) curves have been proposed as an alternative to ROC curves Sabes 
and Jordan (1995). For this reason, we present in Table 1 the Area Under PR curve 
(AUPR) and compare our method with GTE and the state-of-the-art combination of these 
features Ranksum. The results of GTE are obtained with the software available online 
dherkova (2014) using as conditioning levels the values {0.05,0.10} for the iNetl-SizelOO- 
GC’s networks and {0.15,0.20} for the big datasets. We have also computed a statistical 
test to discard non-significant results. First, we compute the contribution of each link to 
the area under the curve and then we apply the Wilcoxon test on the resulting vectors 
Hollander et al. (2013). If the best result of each dataset have a p-value smaller than 5% it 
is typed in italic font and boldfaced. 

Table 1 shows the performance of our individual networks {CTq i%, MDq i%, RD) and 
we can observe that it depends on the properties of the network (high/low-connectivity or 
high/low-activity). 

We also compare our community based approach with the state-of-the-art Ranksum 
approach, which is shown at the last column. Note that the Ranksum makes use of the 
original pairwise derived networks and our method used the symmetrized GTE (denoted 
as I*). We can observe that our normalization and simple combination is able to improve 
the quality of the individual recovered networks and also improves the state-of-the-art com¬ 
munity Ranksum. As shown in the table, our approach is competitive even though our 
method does not recover a directed network as GTE does. It is worth noting that using 
AUC as a metric we obtain similar conclusions. Table 2 with AUC results is available in 
the Supplementary Material (see section 6). 
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/ 

GTE 

II 

corr 

III 

CTo.i% 

IV 

RDq. 1 % 

V 

A771o.i% 

CS (r,III,VI,V} 

RS (I, III, VI, V) 

highcc 

0.163 

0.051 

0.088 

0.166 

0.125 

0.330 

0.184 

highcon 

0.199 

0.030 

0.120 

0.125 

0.073 

0.241 

0.184 

iNetl-SizelOO-CCOlinh 

0.242 

0.117 

0.117 

0.106 

0.123 

0.180 

0.158 

iNetl-Sizel00-CC02inh 

0.247 

0.113 

0.150 

0.103 

0.120 

0.223 

0.181 

iNetl-Sizel00-CC03inh 

0.333 

0.116 

0.198 

0.130 

0.131 

0.314 

0.237 

iNetl-Sizel00-CC04inh 

0.398 

0.120 

0.206 

0.187 

0.158 

0.394 

0.297 

iNetl-Sizel00-CC05inh 

0.366 

0.120 

0.208 

0.288 

0.179 

0.423 

0.366 

iNetl-Sizel00-CC06inh 

0.538 

0.204 

0.188 

0.371 

0.318 

0.582 

0.480 

lowcc 

0.085 

0.015 

0.085 

0.031 

0.022 

0.126 

0.083 

lowcon 

0.093 

0.023 

0.025 

0.024 

0.031 

0.196 

0.125 

normal-1 

0.164 

0.028 

0.085 

0.110 

0.061 

0.251 

0.155 

normal-2 

0.169 

0.028 

0.105 

0.095 

0.048 

0.242 

0.153 

normal-3-liighrate 

0.201 

0.057 

0.037 

0.143 

0.073 

0.293 

0.181 

normal-3 

0.193 

0.025 

0.098 

0.094 

0.044 

0.260 

0.150 

normal-4-lownoise 

0.141 

0.030 

0.120 

0.086 

0.053 

0.271 

0.156 

normal-4 

0.139 

0.026 

0.085 

0.082 

0.046 

0.254 

0.140 

Mean 

0.229 

0.069 

0.120 

0.134 

0.100 

0.286 

0.202 


Table 1: Area Under Precision Recall scores for each inference method at the datasets of the 
connectomics challenge (the higher the better). The Ranksum makes use of the 
original pairwise inferred networks while our method use the symmetrized GTE 
(denoted as I*). The best statistically significant results tested with a Wilcoxon 
test are highlighted. 


Additionally to the results shown at table 1 we also have used our method in the test 
and validation networks where the network is unknown. Using the connectomics submission 
tool we obtain 0.90402 score of Area under the ROC curve, and we would have been ranked 
in the 30th position. 

As stated previously the big advantage of our method is the low complexity O(n^). 
The CPU time needed to compute the different features for the big datasets in a 2 x Intel 
Xeon E5 2670 8C (2.6 GHz), has a mean of 1282.86 minutes for the GTE, 3.31 minutes for 
CTq i%, 66.53 minutes for MDq i% and 21.17 minutes for RD. The process of CLRsum is 
almost instantaneous once we have the individual features, and therefore the computation 
time is the sum of the time needed to compute the individual features. Hence, our proposal 
improves GTE with a negligible overload of time. 

5. Conclusion 

An unsupervised network inference method for neural connectomics has been presented. 
This method improves the state-of-the-art network inference method GTE relying on CLRsum 
consensus among GTE and three new informative features. 

We have compared our method experimentally to two state-of-the-art network inference 
methods, namely GTE and correlation network, on the connectomics challenge datasets. 
The experimental results showed that our proposal is competitive with state-of-the-art 
algorithms. 
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Figure 1: Workflow of the network reconstruction process. 
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/ 

GTE 

II 

corr 

III 

CT^.1% 

IV 

R^o.1% 

V 

^-Do.1% 

CS {r,III,VI, V) 

RS [1, III, VI, V) 

highcc 

0.898 

0.755 

0.791 

0.871 

0.863 

0.891 

0.885 

highcon 

0.898 

0.595 

0.826 

0.806 

0.776 

0.920 

0.896 

iNetl-SizelOO-CCOlinh 

0.705 

0.556 

0.559 

0.513 

0.570 

0.670 

0.640 

iNetl-Sizel00-CC02inh 

0.703 

0.563 

0.653 

0.525 

0.576 

0.733 

0.684 

iNetl-Sizel00-CC03inh 

0.761 

0.582 

0.726 

0.591 

0.616 

0.814 

0.753 

iNetl-Sizel00-CC04inh 

0.789 

0.577 

0.731 

0.662 

0.646 

0.848 

0.787 

iNetl-Sizel00-CC05inh 

0.793 

0.568 

0.697 

0.776 

0.679 

0.875 

0.818 

iNetl-Sizel00-CC06inh 

0.869 

0.753 

0.666 

0.822 

0.816 

0.919 

0.884 

lowcc 

0.864 

0.571 

0.818 

0.680 

0.658 

0.883 

0.839 

lowcon 

0.733 

0.691 

0.683 

0.689 

0.690 

0.829 

0.778 

normal-1 

0.891 

0.681 

0.801 

0.837 

0.799 

0.888 

0.874 

normal-2 

0.891 

0.699 

0.830 

0.826 

0.788 

0.887 

0.877 

normal-3-highrate 

0.888 

0.785 

0.700 

0.847 

0.812 

0.879 

0.874 

normal-3 

0.883 

0.683 

0.811 

0.813 

0.774 

0.886 

0.872 

normal-4-lownoise 

0.884 

0.708 

0.808 

0.803 

0.774 

0.888 

0.866 

normal-4 

0.879 

0.681 

0.796 

0.793 

0.763 

0.885 

0.861 

Mean 

0.833 

0.653 

0.743 

0.741 

0.725 

0.856 

0.824 


Table 2; AUC scores for each inference method at the different datasets of the connectomics 
challenge. The best methods are typed in boldface. 
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